Runbook for new Ephemeral Hotplug Volume Metric#328
Conversation
6604025 to
734882a
Compare
|
|
||
| To diagnose the cause of this alert, the following steps can be taken: | ||
|
|
||
| 1. Find the affected VM(s) and volume(s) that are reported in the Alert. |
There was a problem hiding this comment.
Find the affected VM(s)
@Dsanatar may want to how to list all vmis with matching label selector that was added here kubevirt/kubevirt#15815
There was a problem hiding this comment.
I pushed some changes today that make use of an annotation instead of the previous label approach so that we can store the list of volume names without having to worry about the character limit for label values. I figured this would make things easier for when it came time to patching the specs. That being said, since we can't use a label selector, do you think a jq command to filter vms that have the specific annotation would suffice?
There was a problem hiding this comment.
Thanks for pointing that out though, I added the new annotation so that it could be queried and used here. will fix
There was a problem hiding this comment.
I assumed that the label was just a flag so the value would be true or even empty. Downside of annotations is that have to get all vms and filter on the client side rather than having the server filter. Not sure the convenience of having the volume names in an annotation is worth that cost. Let's see what other reviewers think
There was a problem hiding this comment.
yea true, there's definitely a tradeoff here. Before I changed it to an annotation, I was toying around with a script (or a list of commands) we could provide in the runbook to more easily grab all the affected volumes for each vm but the logic was becoming very similar to what we are already doing in the virt-controller so I figured I could just leverage the existing logic instead. If we were to revert back to the label selector method, do you think providing the list impacted vms and instructing users to compare the two specs to find the volumes would be enough?
There was a problem hiding this comment.
What is the status of this. @mhenriks do you approve the runbook?
2cc6c2a to
4793acd
Compare
|
@Dsanatar please add the Jira issue to the description |
akalenyu
left a comment
There was a problem hiding this comment.
the runbook looks good to me
|
|
||
| Once the VM is patched, you can remove the Alert by running: | ||
| ``` bash | ||
| $ kubectl annotate vmi <vmi-name> kubevirt.io/ephemeral-hotplug-volumes- --overwrite |
There was a problem hiding this comment.
Shouldn't this get removed automatically by VM controller?
There was a problem hiding this comment.
Yea I admit this feels a bit hacky. The reason I have this here is because the VM object that is being used to compare volumes against the VMI doesn't seem to get updated to show the new hotplug volume once they are patched to be persistent (the object only shows the new volumes after VM restart for some reason). Perhaps there is a bug in how i'm retrieving the VM from cache?
There was a problem hiding this comment.
yeah this seems like a bug. I think that function looks okay. Have you traced it and it's returning nil?
There was a problem hiding this comment.
removed this mitigation step, will update kubevirt pr to remove the annotation at the controller level
4793acd to
a8e844d
Compare
|
/lgtm |
docs/runbooks/VirtualMachineInstanceHasEphemeralHotplugVolume.md
Outdated
Show resolved
Hide resolved
Signed-off-by: dsanatar <dsanatar@redhat.com>
a8e844d to
8846e25
Compare
…hotplug-runbook Runbook for new Ephemeral Hotplug Volume Metric
What this PR does / why we need it:
Adding new runbook for Alert that is to be merged in kubevirt. Alert is fired when a VMI contains an ephemeral hotplug volume. Runbook is to notify user of future deprecation of the feature and to advise steps on how to convert these ephemeral volumes into persistent volumes.
Kubevirt PR that adds new metric/alert: kubevirt/kubevirt#15815
Jira: https://issues.redhat.com/browse/CNV-69387
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: